patient simulator
Baichuan-M2: Scaling Medical Capability with Large Verifier System
M2 Team, null, Dou, Chengfeng, Liu, Chong, Yang, Fan, Li, Fei, Jia, Jiyuan, Chen, Mingyang, Ju, Qiang, Wang, Shuai, Dang, Shunya, Li, Tianpeng, Zeng, Xiangrong, Zhou, Yijie, Zhu, Chenzheng, Pan, Da, Deng, Fei, Ai, Guangwei, Dong, Guosheng, Zhang, Hongda, Tai, Jinyang, Hong, Jixiang, Lu, Kai, Sun, Linzhuang, Guo, Peidong, Ma, Qian, Xin, Rihui, Yang, Shihui, Zhang, Shusen, Mo, Yichuan, Liang, Zheng, Zhang, Zhishou, Cui, Hengfu, Zhu, Zuyi, Wang, Xiaochuan
As large language models (LLMs) advance in conversational and reasoning capabilities, their practical application in healthcare has become a critical research focus. However, there is a notable gap between the performance of medical LLMs on static benchmarks such as USMLE and their utility in real-world clinical decision-making. This discrepancy arises because traditional exams fail to capture the dynamic, interactive nature of medical consultations. To address this challenge, we introduce a novel dynamic verification framework that moves beyond static answer verifier, establishing a large-scale, high-fidelity interactive reinforcement learning system. Our framework comprises two key components: a Patient Simulator that creates realistic clinical environments using de-identified medical records, and a Clinical Rubrics Generator that dynamically produces multi-dimensional evaluation metrics. Building on this foundation, we develop Baichuan-M2, a 32B-parameter medical augmented reasoning model trained through a multi-stage reinforcement learning strategy with an improved Group Relative Policy Optimization (GRPO) algorithm. Evaluated on HealthBench, Baichuan-M2 outperforms all other open-source models and most advanced closed-source counterparts, achieving a score above 32 on the challenging HealthBench Hard benchmark-previously exceeded only by GPT-5. Our work demonstrates that robust dynamic verifier system is essential for aligning LLM capabilities with practical clinical applications, establishing a new Pareto front in the performance-parameter trade-off for medical AI deployment.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (7 more...)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Health Care Technology > Medical Record (0.88)
LLM-Powered Virtual Patient Agents for Interactive Clinical Skills Training with Automated Feedback
Voigt, Henrik, Sugamiya, Yurina, Lawonn, Kai, Zarrieß, Sina, Takanishi, Atsuo
Objective Structured Clinical Examinations (OSCEs) are essential for medical training, but they require significant resources, including professional actors and expert medical feedback. Although Large Language Models (LLMs) have introduced text-based virtual patients for communication practice, these simulations often lack the capability for richer, non-textual interactions. This paper presents a novel framework that significantly enhances LLM-based simulated patients by equipping them with action spaces, thereby enabling more realistic and dynamic patient behaviors that extend beyond text. Furthermore, our system incorporates virtual tutors that provide students with instant, personalized feedback on their performance at any time during these simulated encounters. We have conducted a rigorous evaluation of the framework's real-time performance, including system latency and component accuracy. Preliminary evaluations with medical experts assessed the naturalness and coherence of the simulated patients, as well as the usefulness and appropriateness of the virtual tutor's assessments. This innovative system provides medical students with a low-cost, accessible platform for personalized OSCE preparation at home.
- Health & Medicine (1.00)
- Education > Curriculum > Subject-Specific Education (0.87)
- Education > Educational Technology > Educational Software > Computer Based Training (0.69)
- Education > Educational Setting (0.68)
AI Agents for Conversational Patient Triage: Preliminary Simulation-Based Evaluation with Real-World EHR Data
Rashidian, Sina, Li, Nan, Amar, Jonathan, Lee, Jong Ha, Pugh, Sam, Yang, Eric, Masterson, Geoff, Cha, Myoung, Jia, Yugang, Vaid, Akhil
Background: We present a Patient Simulator that leverages real world patient encounters which cover a broad range of conditions and symptoms to provide synthetic test subjects for development and testing of healthcare agentic models. The simulator provides a realistic approach to patient presentation and multi-turn conversation with a symptom-checking agent. Objectives: (1) To construct and instantiate a Patient Simulator to train and test an AI health agent, based on patient vignettes derived from real EHR data. (2) To test the validity and alignment of the simulated encounters provided by the Patient Simulator to expert human clinical providers. (3) To illustrate the evaluation framework of such an LLM system on the generated realistic, data-driven simulations -- yielding a preliminary assessment of our proposed system. Methods: We first constructed realistic clinical scenarios by deriving patient vignettes from real-world EHR encounters. These vignettes cover a variety of presenting symptoms and underlying conditions. We then evaluate the performance of the Patient Simulator as a simulacrum of a real patient encounter across over 500 different patient vignettes. We leveraged a separate AI agent to provide multi-turn questions to obtain a history of present illness. The resulting multiturn conversations were evaluated by two expert clinicians. Results: Clinicians scored the Patient Simulator as consistent with the patient vignettes in those same 97.7% of cases. The extracted case summary based on the conversation history was 99% relevant. Conclusions: We developed a methodology to incorporate vignettes derived from real healthcare patient data to build a simulation of patient responses to symptom checking agents. The performance and alignment of this Patient Simulator could be used to train and test a multi-turn conversational AI agent at scale.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > California > San Mateo County > San Bruno (0.04)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Health Care Technology (0.95)
Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators
Liu, Zhaocheng, Tu, Quan, Ye, Wen, Xiao, Yu, Zhang, Zhishou, Cui, Hengfu, Zhu, Yalun, Ju, Qiang, Li, Shizheng, Xie, Jian
Online medical consultation (OMC) restricts doctors to gathering patient information solely through inquiries, making the already complex sequential decision-making process of diagnosis even more challenging. Recently, the rapid advancement of large language models has demonstrated a significant potential to transform OMC. However, most studies have primarily focused on improving diagnostic accuracy under conditions of relatively sufficient information, while paying limited attention to the "inquiry" phase of the consultation process. This lack of focus has left the relationship between "inquiry" and "diagnosis" insufficiently explored. In this paper, we first extract real patient interaction strategies from authentic doctor-patient conversations and use these strategies to guide the training of a patient simulator that closely mirrors real-world behavior. By inputting medical records into our patient simulator to simulate patient responses, we conduct extensive experiments to explore the relationship between "inquiry" and "diagnosis" in the consultation process. Experimental results demonstrate that inquiry and diagnosis adhere to the Liebig's law: poor inquiry quality limits the effectiveness of diagnosis, regardless of diagnostic capability, and vice versa. Furthermore, the experiments reveal significant differences in the inquiry performance of various models. To investigate this phenomenon, we categorize the inquiry process into four types: (1) chief complaint inquiry; (2) specification of known symptoms; (3) inquiry about accompanying symptoms; and (4) gathering family or medical history. We analyze the distribution of inquiries across the four types for different models to explore the reasons behind their significant performance differences. We plan to open-source the weights and related code of our patient simulator at https://github.com/LIO-H-ZEN/PatientSimulator.
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Beijing > Beijing (0.05)
- Asia > Taiwan (0.04)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (0.68)
From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching
Yang, Eric, Garcia, Tomas, Williams, Hannah, Kumar, Bhawesh, Ramé, Martin, Rivera, Eileen, Ma, Yiran, Amar, Jonathan, Catalani, Caricia, Jia, Yugang
Effective management of cardiometabolic conditions requires sustained positive nutrition habits, often hindered by complex and individualized barriers. Direct human management is simply not scalable, while previous attempts aimed at automating nutrition coaching lack the personalization needed to address these diverse challenges. This paper introduces a novel LLM-powered agentic workflow designed to provide personalized nutrition coaching by directly targeting and mitigating patient-specific barriers. Grounded in behavioral science principles, the workflow leverages a comprehensive mapping of nutrition-related barriers to corresponding evidence-based strategies. A specialized LLM agent intentionally probes for and identifies the root cause of a patient's dietary struggles. Subsequently, a separate LLM agent delivers tailored tactics designed to overcome those specific barriers with patient context. We designed and validated our approach through a user study with individuals with cardiometabolic conditions, demonstrating the system's ability to accurately identify barriers and provide personalized guidance. Furthermore, we conducted a large-scale simulation study, grounding on real patient vignettes and expert-validated metrics, to evaluate the system's performance across a wide range of scenarios. Our findings demonstrate the potential of this LLM-powered agentic workflow to improve nutrition coaching by providing personalized, scalable, and behaviorally-informed interventions.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Switzerland (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.93)
- Education (0.68)
Manikin-Recorded Cardiopulmonary Sounds Dataset Using Digital Stethoscope
Torabi, Yasaman, Shirani, Shahram, Reilly, James P.
Heart and lung sounds are crucial for healthcare monitoring. Recent improvements in stethoscope technology have made it possible to capture patient sounds with enhanced precision. In this dataset, we used a digital stethoscope to capture both heart and lung sounds, including individual and mixed recordings. To our knowledge, this is the first dataset to offer both separate and mixed cardiorespiratory sounds. The recordings were collected from a clinical manikin, a patient simulator designed to replicate human physiological conditions, generating clean heart and lung sounds at different body locations. This dataset includes both normal sounds and various abnormalities (i.e., murmur, atrial fibrillation, tachycardia, atrioventricular block, third and fourth heart sound, wheezing, crackles, rhonchi, pleural rub, and gurgling sounds). The dataset includes audio recordings of chest examinations performed at different anatomical locations, as determined by specialist nurses. Each recording has been enhanced using frequency filters to highlight specific sound types. This dataset is useful for applications in artificial intelligence, such as automated cardiopulmonary disease detection, sound classification, unsupervised separation techniques, and deep learning algorithms related to audio signal processing.
- North America > Canada > Ontario > Hamilton (0.14)
- North America > United States > California > Los Angeles County > Northridge (0.04)
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator
Liao, Yusheng, Meng, Yutong, Wang, Yuhao, Liu, Hongcheng, Wang, Yanfeng, Wang, Yu
Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored. Previous works mainly focus on the performance of medical knowledge with examinations, which is far from the realistic scenarios, falling short in assessing the abilities of LLMs on clinical tasks. In the quest to enhance the application of Large Language Models (LLMs) in healthcare, this paper introduces the Automated Interactive Evaluation (AIE) framework and the State-Aware Patient Simulator (SAPS), targeting the gap between traditional LLM evaluations and the nuanced demands of clinical practice. Unlike prior methods that rely on static medical knowledge assessments, AIE and SAPS provide a dynamic, realistic platform for assessing LLMs through multi-turn doctor-patient simulations. This approach offers a closer approximation to real clinical scenarios and allows for a detailed analysis of LLM behaviors in response to complex patient interactions. Our extensive experimental validation demonstrates the effectiveness of the AIE framework, with outcomes that align well with human evaluations, underscoring its potential to revolutionize medical LLM testing for improved healthcare delivery.
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (5 more...)
LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation
Shi, Xiaoming, Xu, Jie, Ding, Jinru, Pang, Jiali, Liu, Sichen, Luo, Shuqing, Peng, Xingwei, Lu, Lu, Yang, Haihong, Hu, Mingtao, Ruan, Tong, Zhang, Shaoting
There is an increasing interest in developing LLMs for medical diagnosis to improve diagnosis efficiency. Despite their alluring technological potential, there is no unified and comprehensive evaluation criterion, leading to the inability to evaluate the quality and potential risks of medical LLMs, further hindering the application of LLMs in medical treatment scenarios. Besides, current evaluations heavily rely on labor-intensive interactions with LLMs to obtain diagnostic dialogues and human evaluation on the quality of diagnosis dialogue. To tackle the lack of unified and comprehensive evaluation criterion, we first initially establish an evaluation criterion, termed LLM-specific Mini-CEX to assess the diagnostic capabilities of LLMs effectively, based on original Mini-CEX. To address the labor-intensive interaction problem, we develop a patient simulator to engage in automatic conversations with LLMs, and utilize ChatGPT for evaluating diagnosis dialogues automatically. Experimental results show that the LLM-specific Mini-CEX is adequate and necessary to evaluate medical diagnosis dialogue. Besides, ChatGPT can replace manual evaluation on the metrics of humanistic qualities and provides reproducible and automated comparisons between different LLMs.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Oklahoma (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.48)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
Why isn't AI helping us today with COVID-19? ZDNet
Wouldn't it be great if a medical diagnosis could be automated with machine learning and artificial intelligence? Skip waiting days or weeks for an appointment, then being asked questions with looking and poking. Just go online, get the questions from an AI, and then get a physical appointment if warranted. From cancelled conferences to disrupted supply chains, not a corner of the global economy is immune to the spread of COVID-19. But like all ML/AI apps, models need training.
Pediatric Hal is a Patient Simulator That Can Bleed, Cry and Sweat - Robot News
To help it's nursing students get more real-world experience with pediatric patients, Bunker Hill Community College in Boston is turning to robots. They've been using Pediatric Hal to simulate a 5-year-old male patient. The robot comes from Miami based Gaumard Scientific who are known for their wide variety of medical simulators. Bunker Hill turned to robots partially due to the limited availability of clinical placement slots with patients in the area. Hal is as close to a real as you can get.